The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. Our method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number...
We describe algorithms for incorporating prior sequence knowledge into the candidate generation stage of de novo peptide sequencing by tandem mass spectrometry. We focus on two types of prior knowledge: homology to known sequences encoded by a regular expression or position-specific score matrix, and amino acid content encoded by a multiset of required residues. We show an application to de novo sequencing...
Motivation Metabolic networks are a representation of current knowledge about the metabolic reactions available to a given organism. These networks can be placed into various mathematical frameworks, of which the constraintbased framework [1] has received the most attention over the past 15 years. This results in a predictive model of metabolism. Metabolic models can yield predictions of two types:...
Over the past decade gene expression data sets have been generated at an increasing pace. In addition to ever increasing data generation, the biomedical literature is growing exponentially. The PubMed database (Sayers et al., 2010) comprises more than 20 million citations as of October 2010. The goal of our method is the prediction of putative upstream regulators of observed expression changes based...
As whole genome sequencing has become a routine biological experiment, algorithms for assembly of whole genome shotgun data has become a topic of extensive research, with a plethora of off-the-shelf methods that can reconstruct the genomes of many organisms. Simultaneously, several recently sequenced genomes exhibit very high polymorphism rates. For these organisms genome assembly remains a challenge...
A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation...
In recent years, many algorithms have been developed to narrow down the set of candidate disease genes implicated by genome wide association studies (GWAS), using knowledge on protein-protein interactions (PPIs). All of these algorithms are based on a common principle; functional association between proteins is correlated with their connectivity/proximity in the PPI network. However, recent research...
The availability of expression quantitative trait loci (eQTL) data can help understanding the genetic basis of variation in gene expression. However, it has proven difficult to accurately predict functional genetic changes due to low statistical power. To address this challenge, we developed a novel computational approach for combining eQTL data with complementary regulatory network to identify modules...
This paper presents a graph-based algorithm for identifying complex metabolic pathways in multi-genome scale metabolic data. These complex pathways are called branched pathways because they can arrive at a target compound through combinations of pathways that split compounds into smaller ones, work in parallel with many compounds, and join compounds into larger ones. While most previous work has focused...
Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair...
Motivation Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper,...
Many applications of computational biology require a variable selection procedure to sift through a large number of input variables and select some smaller number that influence a target variable of interest. For example, in virology, only some small number of viral protein fragments influence the nature of the immune response during viral infection. Due to the large number of variables to be considered,...
Genomic distance between two genomes, i.e., the smallest number of genome rearrangements required to transform one genome into the other, is often used as a measure of evolutionary closeness of the genomes in comparative genomics studies. However, in models that include rearrangements of significantly different “power” such as reversals (that are “weak” and most frequent rearrangements)...
Introduction Multiple sequence alignment (MSA), which is of fundamental importance for comparative genomics, is a difficult problem and error-prone. Therefore, it is essential to measure the reliability of the alignments and incorporate it into downstream analyses. Many studies have been conducted to find the extent, cause and effect of the alignment errors [4], and to heuristically estimate the quality...
Can we find the family trees, or pedigrees, that relate the haplotypes of a group of individuals? Collecting the genealogical information for how individuals are related is a very time-consuming and expensive process. Methods for automating the construction of pedigrees could stream-line this process. While constructing single-generation families is relatively easy given whole genome data, reconstructing...
The ability to design and engineer organisms demands the ability to predict kinetic responses of novel regulatory networks built from well-characterized biological components. Surprisingly, few validated kinetic models of complex regulatory networks have been derived by combining models of the network components. A major bottleneck in producing such models is the difficulty of measuring in vivo rate...
A new method based on a mathematically natural local search framework for max cut is developed to uncover functionally coherent module and BPM motifs in high-throughput genetic interaction data. Unlike previous methods which also consider physical protein-protein interaction data, our method utilizes genetic interaction data only; this becomes increasingly important as high-throughput genetic interaction...
The new second generation sequencing technology revolutionizes many biology related research fields, and posts various computational biology challenges. One of them is transcriptome assembly based on RNA-Seq data, which aims at reconstructing all full-length mRNA transcripts simultaneously from millions of short reads. In this paper, we consider three objectives in transcriptome assembly: the maximization...
Haplotypes, as they specify the linkage patterns between dispersed genetic variations, provide important information for understanding the genetics of human traits. However haplotypes are not directly available from current genotyping platforms, and hence there are extensive investigations of computational methods to recover such information. Two major computational challenges arising in current family-based...
Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.